After succesfully accomplishing the exercises on tidy data and listening to lengthy lectures on data formats as well as specifics of importing them, it’s now your turn to get used to importing data in the tidyverse.

We prepared some datasets, for example, the Titanic dataset from Kaggle, which you can use to play with some of the functions from readr and related packages. You can find them in the ../data folder. However, importing data often implies firing up only one command and that’s it. For this reason, in these exercises, we prepared some special tasks you can work on.

This being said, let’s start with some easy data importing.

1

Load the titanic data.
The file format is CSV, accordingly you need the readr library and the function read_...
library(readr)

titanic <-
  read_csv("../data/titanic/titanic.csv")
## Parsed with column specification:
## cols(
##   PassengerId = col_double(),
##   Survived = col_double(),
##   Pclass = col_double(),
##   Name = col_character(),
##   Sex = col_character(),
##   Age = col_double(),
##   SibSp = col_double(),
##   Parch = col_double(),
##   Ticket = col_character(),
##   Fare = col_double(),
##   Cabin = col_character(),
##   Embarked = col_character()
## )

You may have noticed that the function you just used is importing factor variables as characters by default. For some analyses, this is not what we want. So let’s pretend we’re particularly interested in gender differences in a regression model or the like.

2

Convert the variable Sex to a factor.
You can do that while importing the data or after loading them.
titanic <-
  read_csv(
    "../data/titanic/titanic.csv",
    col_types = cols(
      Sex = col_factor()
    )
  )

After working on the titanic data we got bored. Now we want to work on some longitudinal and cross-country level data. The gapminder GDP data comes to our mind!

3

Load the gapminder GDP data.
Note that in the data folder actually two different gapminder datasets are stored.
library(readxl)

gapminder_GDP <-
  read_excel("../data/gapminder/GDPpercapitaconstant2000US.xlsx")

Although you had to apply two different importing functions, the outcome is no different: what you got are tibbles. However, especially the file format of the latter dataset is more complex. In the last exercise we expand on that and apply some more options with the help of the unicorn data.

4

Load the unicorn sales data file. As we are not interested in the total_turnover variable only read in the cell range A1:C43
You can define ranges with the option range = range_definition.
library(readxl)

unicorn_sales <-
  read_excel(
    "../data/unicorns/sales.xlsx",
    range = "A1:C43"
      )